Picture for Jingyi Zhang

Jingyi Zhang

R1-SyntheticVL: Is Synthetic Data from Generative Models Ready for Multimodal Large Language Model?

Add code
Feb 03, 2026
Viaarxiv icon

STILL: Selecting Tokens for Intra-Layer Hybrid Attention to Linearize LLMs

Add code
Feb 02, 2026
Viaarxiv icon

MGFFD-VLM: Multi-Granularity Prompt Learning for Face Forgery Detection with VLM

Add code
Jul 16, 2025
Figure 1 for MGFFD-VLM: Multi-Granularity Prompt Learning for Face Forgery Detection with VLM
Figure 2 for MGFFD-VLM: Multi-Granularity Prompt Learning for Face Forgery Detection with VLM
Figure 3 for MGFFD-VLM: Multi-Granularity Prompt Learning for Face Forgery Detection with VLM
Figure 4 for MGFFD-VLM: Multi-Granularity Prompt Learning for Face Forgery Detection with VLM
Viaarxiv icon

R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO

Add code
May 22, 2025
Viaarxiv icon

R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization

Add code
Mar 17, 2025
Viaarxiv icon

AI Guide Dog: Egocentric Path Prediction on Smartphone

Add code
Jan 14, 2025
Figure 1 for AI Guide Dog: Egocentric Path Prediction on Smartphone
Figure 2 for AI Guide Dog: Egocentric Path Prediction on Smartphone
Figure 3 for AI Guide Dog: Egocentric Path Prediction on Smartphone
Figure 4 for AI Guide Dog: Egocentric Path Prediction on Smartphone
Viaarxiv icon

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Add code
Dec 24, 2024
Figure 1 for Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Figure 2 for Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Figure 3 for Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Figure 4 for Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Viaarxiv icon

RoomTour3D: Geometry-Aware Video-Instruction Tuning for Embodied Navigation

Add code
Dec 11, 2024
Viaarxiv icon

Historical Test-time Prompt Tuning for Vision Foundation Models

Add code
Oct 27, 2024
Figure 1 for Historical Test-time Prompt Tuning for Vision Foundation Models
Figure 2 for Historical Test-time Prompt Tuning for Vision Foundation Models
Figure 3 for Historical Test-time Prompt Tuning for Vision Foundation Models
Figure 4 for Historical Test-time Prompt Tuning for Vision Foundation Models
Viaarxiv icon

Open-Vocabulary Object Detection via Language Hierarchy

Add code
Oct 27, 2024
Figure 1 for Open-Vocabulary Object Detection via Language Hierarchy
Figure 2 for Open-Vocabulary Object Detection via Language Hierarchy
Figure 3 for Open-Vocabulary Object Detection via Language Hierarchy
Figure 4 for Open-Vocabulary Object Detection via Language Hierarchy
Viaarxiv icon